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TECHNICAL FIELD OF THE I3WENTI0N 

The present invention relates generally to the field of 
computers and, more particularly, to a system and method for 
product data standardization. 



BACKGROUND OF THE INVENTION 

In general, meals which are "prepared away from home" are 
provided by the food service industry. The food service industry 
involves a number of different entities or participants, 
including manufacturers, distributors, and operators. 
Manufacturers- -such as, for example, dairies, bakeries, and 
farms — produce the products from which meals are prepared. 
Distributors act as "middle men" to consolidate the products from 
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a number of manufacturers and deliver the same to operators. 
Operators- -which include restaurants, hotels, school cafeterias, 
airlines, etc. --use the products to actually prepare and/or serve 
meals to consumers. Operators may have multiple locations or 
5 "units" at which services are rendered or provided (i.e., where 
meals are prepared and/or made available to the consumers) . 

Although the food service industry represents a significant 
share of all retail food sales, it is rife with inefficiencies. 
The greatest challenge the food service industry faces today is 
~ LO streamlining all areas of the supply chain to improve the 
profitability of all the participants (e.g., operators, 
distributors, and manufacturers) . For example, for multi-unit 
food service operators, food is the most important raw material 
:=rf for business, and thus, its purchase is a mission-critical, 
\t%S strategic operation. In the low-margin food service industry, 

reducing food costs by one percent can yield a twenty percent or 
more increase in revenue . Operators may thus seek volume 
discounts for products of a particular manufacturer or 
distributor. Furthermore, operators may establish "preferred" 
2 0 suppliers (manufacturers or distributors) from which products 
should be ordered. Distributors and manufacturers, too, can 
leverage efficient purchasing initiatives to drive down their 
costs and compete more effectively. 

In order to implement more effective and efficient 
25 purchasing strategies, participants require particular 

information. To date, this type of informat ion- - commonly 
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available in retail/consumer segments (e.g., grocery industry) -- 
has been absent from the food service industry. Specifically, a 
major technical impediment to seamless transaction and 
information flow between trading partners in the food service 
5 industry has been a lack of standards for identifying products. 
In the grocery industry, which is generally responsible for food 
"prepared at home," almost any given item is identifiable by 
respective standard universal product code (UPC) that is 
understood and accepted at any point in the supply chain, 

.-10 including any checkout scanner. In the food service industry, 

however, the same item may be described in a number of different 
ways by various distributors who supply the item to operators. 
For each distributor, the same item may carry a different product 
description, manufacturer identifier, product number, pack, and 

;i=!15 size description. 

Without unified, mult i -distributor purchase management 
reporting and analysis, food service operators cannot proactively 
manage purchasing activities or move forward with initiatives 
(e.g., volume discounts or rebates) that positively impact 
20 company profitability. For food service operators, purchasing is 
the daily mission-critical job that can mean the difference 
between profit and loss, especially for mult i -unit operators 
which must coordinate purchases from multiple locations with 
multiple vendors. Corporate purchasing standards need to be 
25 controlled across all units to ensure consistent food quality and 
to obtain the maximum in volume buying power. But the lack of 

3 



M-8603 US 

consistency in product information makes it difficult to bring 
new efficiencies and control to the purchasing process. For 
example, off -contract purchases undermine the buying efforts but 
are difficult to detect or prevent. 

This lack of consistency in product information also 
presents a major barrier to food service manufacturers. Without 
a single, consistent standard identifier for its products 
throughout the food service channels, manufacturers are stymied 
in their efforts to track purchasing patterns, market share 
statistics, promotional activities, and more. Instead of a 
single identifier, there are a myriad of identifiers for 
identical products, making data aggregation a time-consuming, 
error-prone nightmare for business analysts. Without the basic 
data foundation, manufacturers have had to rely on educated 
guesses and hunches about what the best markets are for their 
products and how their products compare against the competition. 

SUMMARY OF THE INVENTION 

In order for participants in the food service industry to 
optimize efficiency in their operations, for example, in the 
areas of marketing, distribution, and purchasing, the present 
invention provides a computer system and method for standardizing 
the raw data generated by diverse data sources during the 
movement of products across various supply chains. The system 
and method standardizes product data, identifying manufacturers 
and brands of products described in many different formats and 
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assigning appropriate standardized product codes. That is, the 
system and method generate standardized product data in which 
similar products are identified by the same identifier or 
description. Standardized product data is critical for 
streamlining the supply chains for products in the food service 
industry. 

According to an embodiment of the present invention, a 
computer system is provided for generating standardized product 
data. The computer system includes a database which maintains 
data for a plurality of known products, each known product 
associated with a respective standardized product code. A 
processing facility, coupled to the database, receives raw data 
for an unidentified product from a plurality of diverse data 
sources, each of which has its own separate identifier for the 
unidentified product. The processing facility compares the raw 
data for the unidentified product against the data for the 
plurality of known products. If there is a match between the raw 
data for the unidentified product and the data for one of the 
plurality of known products, the processing facility assigns the 
respective standardized product code of the matching known 
product to the unidentified product. 

According to another embodiment of the present invention, a 
method performed on a computer system is provided for generating 
standardized product data. The method includes the following 
steps: maintaining data for a plurality of known products, each 
known product associated with a respective standardized product 
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code; receiving raw data for an unidentified product from a 
plurality of diverse data sources, each data source having its 
own separate identifier for the unidentified product; comparing 
the raw data for the unidentified product against the data for 
the plurality of known products; and if there is a match between 
the raw data for the unidentified product and the data for one of 
the plurality of known products, assigning the respective 
standardized product code of the matching known product to the 
unidentified product. 

According to yet another embodiment of the present 
invention, a computer system for generating standardized product 
data includes a database operable to maintain data for a 
plurality of known products, each of which is associated with a 
respective standardized product code. The data maintained in the 
database comprises a separate stored description and set of field 
values for each of the known products. A processing facility, 
coupled to the database, receives raw data for an unidentified 
product from a plurality of diverse data sources, each of which 
has its own separate identifier for the unidentified product. 
The raw data comprises a raw description and set of field values 
for the unidentified product. The processing facility compares 
the raw description for the unidentified product against the 
stored descriptions for each of the known products. If the raw 
description for the unidentified product does not match any of 
the stored descriptions for the known products, the processing 
facility compares a predetermined combination of the field values 
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for the unidentified product against corresponding field values 
for each of the known products. If the raw description for the 
unidentified product matches a stored description for one of the 
known products, or alternatively, if all of the field values for 
the unidentified product match the corresponding field values for 
one of the known products for the predetermined combination, the 
processing facility assigns the respective standardized product 
code of the matching known product to the unidentified product. 

A technical advantage of the present invention includes 
providing a system and method which are able to assign a 
standardized product code identifier to identical products 
described in raw data received from a plurality of diverse data 
sources, each source having its own separate description for the 
products. Because identical products are given the same 
identifier, an operator can monitor the performance of its 
distributors with reports that track deliveries, substitutions, 
shorts and other statistics for individual distributors. In 
addition, operators can consolidate all of their food buying 
activities into a single, unified purchasing process. 
Furthermore, with the standardized product data generated by the 
system and method of the present invention, manufacturers are 
able to more readily track purchasing patterns, market share 
statistics, promotional activities, etc. 

Another technical advantage of the present invention 
includes providing multiple levels of automated matching to 
identify products specified in raw data and to assign appropriate 
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standardized product codes to the identified products. These 
levels of matching include signature match, a combination match, 
and a pattern match. In a signature match, a raw description or 
"signature" for an unidentified product specified in raw data is 
compared to the signatures of known products. In a combination 
match, various field values in the raw data for an unidentified 
product are compared against the field values of known products 
for one or more predetermined combinations of fields. In a 
pattern match, partial matches are calculated- - i . e . , the field 
values for an unidentified product are compared against the field 
values of various known products to determine the fractional 
similarity therebetween. The multiple levels of automated 
matching are designed to reduce or eliminate the need for manual 
analysis to identify a particular product specified in raw data. 
That is, manual analysis is required only if a product cannot be 
identified by one of the levels of automated matching. 

Yet another technical advantage of the present invention 
includes providing a system and method which use a pattern match 
for identifying products. In a pattern match, for each of a 
number of fields, a comparison is made to gauge the similarity 
between the field value of an unidentified product and the field 
value of a known product. If there is sufficient similarity 
between values for each of the fields, the unidentified product 
can be identified as the known product. This is the case even if 
the values are not exact (i.e., Boolean) matches in each field. 
The pattern match of fields affords several benefits. Close 
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matches are no longer viewed as mismatches. Because the system 
and method are able to determine close (albeit, not exact) 
matches, more products can be automatically identified. The 
unidentified products requiring manual attention will be limited 
to only those that have field values with significant differences 
to all known products. Furthermore, the pattern match eliminates 
the need to maintain data for every single variation of field 
value, as would be required for a Boolean comparison. This 
improves system performance. 

Other aspects and advantages of the present invention will 
become apparent from the following descriptions and accompanying 
drawings . 

BRIEF DESCRIPTION OF THE DRAWINGS 

For a more complete understanding of the present invention 
and for further features and advantages, reference is now made to 
the following description taken in conjunction with the 
accompanying drawings, in which: 

Fig. 1 illustrates an exemplary environment in which a 
product data standardization system, according to an embodiment 
of the present invention, may operate; 

Fig. 2 illustrates a product data standardization system, 
according to an embodiment of the present invention; 

Fig. 3 is a block diagram for a data receiving component, 
according to an embodiment of the present invention; 
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Fig. 4 is a block diagram for a data analysis component, 
according to an embodiment of the present invention; 

Fig. 5 illustrates an exemplary computer-based for 
implementing the product data standardization system; 
5 Fig. 6 illustrates exemplary raw data in flat file format; 

Fig. 7 illustrates an exemplary screen display for 
manufacturer assignment and audit, according to an embodiment of 
the present invention; 

Fig. 8 illustrates an exemplary screen display for 
yiO standardized product code assignment and audit, according to an 
embodiment of the present invention; 

Fig. 9 illustrates an exemplary screen display for 
standardized product code creation, according to an embodiment of 
the present invention; 
1^5 Fig. 10 is a flow diagram of an exemplary method for 

standardizing product data, according to an embodiment of the 
present invention; 

Fig. 11 is a flow diagram of an exemplary method for 
performing a signature match, according to an embodiment of the 
2 0 present invention; 

Fig. 12 is a flow diagram of an exemplary method for 
matching a combination of fields for a product, according to an 
embodiment of the present invention; 

Fig. 13 is a flow diagram of an exemplary method for 
25 matching a combination of fields for a manufacturer, according to 
an embodiment of the present invention; 
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Fig. 14 is a flow diagram of an exemplary method for 
generating a guess as to the identity of a product, according to 
an embodiment of the present invention; and 

Fig. 15 is a flow diagram of an exemplary method for 
5 performing a pattern match, according to an embodiment of the 
present invention. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

The preferred embodiments of the present invention and their 
|OlO advantages are best understood by referring to Figs. 1-15 of the 
^ drawings. Like numerals are used for like and corresponding 
■■J parts of the various drawings. 

Turning first to the nomenclature of the specification, the 
detailed description which follows is represented largely in 
iifs terms of processes and symbolic representations of operations 
;;=! performed by conventional computer components, such as a local or 
remote central processing unit (CPU) or processor associated with 
a general purpose computer system, memory storage devices for the 
processor, and connected local or remote pixel -oriented display 
20 devices. These operations include the manipulation of data bits 
by the processor and the maintenance of these bits within data 
structures resident in one or more of the memory storage devices . 
Such data structures impose a physical organization upon the 
collection of data bits stored within computer memory and 
25 represent specific electrical or magnetic elements. These 

symbolic representations are the means used by those skilled in 
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the art of computer programming and computer construction to most 
effectively convey teachings and discoveries to others skilled in 
the art . 

For purposes of this discussion, a process, method, routine, 
5 or sub-routine is generally considered to be a sequence of 

computer- executed steps leading to a desired result. These steps 
generally require manipulations of physical quantities. Usually, 
although not necessarily, these quantities take the form of 
electrical, magnetic, or optical signals capable of being stored, 
'^^0 transferred, combined, compared, or otherwise manipulated. It is 
conventional for those skilled in the art to refer to these 
signals as bits, values, elements, symbols, characters, text, 
terms, numbers, records, files, or the like. It should be kept 
in mind, however, that these and some other terms should be 
associated with appropriate physical quantities for computer 
operations, and that these terms are merely conventional labels 
applied to physical quantities that exist within and during 
operation of the computer. 

It should also be understood that manipulations within the 

2 0 computer are often referred to in terms such as adding, 

comparing, moving, searching, or the like, which are often 
associated with manual operations performed by a human operator. 
It must be understood that no involvement of the human operator 
may be necessary, or even desirable, in the present invention. 

2 5 The operations described herein are machine operations performed 
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in conjunction with the human operator or user that interacts 
with the computer or computers. 

In addition, it should be understood that the programs, 
processes, methods, and the like, described herein are but an 
5 exemplary implementation of the present invention and are not 
related, or limited, to any particular computer, apparatus, or 
computer language. Rather, various types of general purpose 
computing machines or devices may be used with programs 
constructed in accordance with the teachings described herein. 

3.0 Similarly, it may prove advantageous to construct a specialized 
apparatus to perform the method steps described herein by way of 

i= dedicated computer systems with hard-wired logic or programs 

stored in non-volatile memory, such as read-only memory (ROM) . 

f;15 Supply Chains For the Food Service Industry 

Referring now to the drawings. Fig. 1 illustrates an 
exemplary environment in which a product data standardization 
system 10, according to an embodiment of the present invention, 
may operate. In particular. Fig. 1 depicts a number of supply 
2 0 chains formed by various participants of the food service 

industry, including manufacturers 12 (separately labeled 12a, 
12b, and 12c) , distributors 14 (separately labeled 14a, 14b, and 
14c) , and operators 16 (separately labeled 16a, 16b, and 16c) . 

Manufacturers manufacture or produce the products which are 
25 moved in the supply chains for the food service industry. These 
products may include any consumable items which are used in the 
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preparation and/or service of meals "prepared away from home." 
For example, the products may include food, such as, flour, milk, 
eggs, meat, poultry, fish, vegetables, fruit, bread, condiments, 
processed sauces, seasonings, etc. The products may also include 
serving items, such as, plates, glasses, cups, china, utensils, 
serving trays, napkins, tablecloths, take-out containers, etc. 
Although some of the products may have a universal product code 
(UPC) symbol by which they can be identified, many of the 
products do not. For example, a bottle of ketchup of a certain 
size from a particular manufacturer may have a UPC symbol, but a 
plate may not . 

Distributors 14 consolidate and distribute the products from 
a number of manufacturers 12. In many instances, more than one 
distributor 14 may distribute the products of a given 
manufacturer 12. Each distributor 14 may have one or more 
distribution units 18. As shown, distributors 14a and 14c each 
has a single distribution unit 18, whereas distributor 14b has 
multiple distribution units 18. Each distribution unit 18 may 
comprise a warehouse facility for temporarily housing products 
and one or more transport vehicles for delivering the products. 

Operators 16 receive the products from one or more 
distributors 14. In many cases, an operator 16 may receive 
identical products manufactured by the same manufacturer 12 from 
multiple distributors 14 . Each operator 16 may have one or more 
operating units 20. As shown, operators 16a and 16c has multiple 
operating units 20, whereas operator 16b has but a single 
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operating unit 20. Each operating unit 20 can be a location or 
facility at which meals are prepared and/or served using the 
products . 

Each participant forming part of one or more supply chains 
5 may have its own identifier and/or description for identifying a 
particular product. For example, a manufacturer 12 of a fourteen 
ounce bottle of ketchup may describe such product as "TOMATO 
KETCHUP BOTTLE FANCY GRADE." A distributor 14 for the same 
product may describe it as "TOM CATSUP BOTTLE." An operator 16 
:|I.O receiving the same product may describe it as "KETCHUP BOTTLE 

PLASTIC FANCY." Furthermore, for distributors 14 and operators 
16 having multiple units, each individual distribution unit 18 or 
operating unit 2 0 may have its own separate identifier or 
'{^ description for the product. Thus, there may be no uniformity of 
[{15 identif ier/description for a product even within the organization 
of a particular participant. 

As products are moved through the supply chains from 
manufacturers 12 to distributors 14 to operators 16, various data 
and information are generated by each of the participants (or its 
20 units) to document the relevant transactions. This data and 
information may appear, for example, in purchase orders, 
invoices, bills of sale, receipts, catalogs, brochures, etc., and 
may specify products bought or sold, amounts for each product, 
dates of purchase/sale, dates of delivery, the participants 
25 selling products, the participants purchasing products, locations 
from which products were shipped, locations to which products are 
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delivered, carriers for delivery of products, etc. This 
data/information constitutes "raw data," and any participant (or 
smaller unit) generating or outputting the same constitutes a 
"data source." 

5 The raw data produced by any data source typically 

incorporates that data source's identifiers or descriptions for 
the products which are bought or sold. Because the product 
identifiers/descriptions may differ between data sources, the raw 
data generated within the supply chains for the food service 

;{lo industry lacks consistency. Accordingly, this raw data may not 

2 be very useful to manufacturers 12, distributors 14, or operators 
• 16 which are interested in the total amounts of products 

bought/sold, amounts of each product purchased from or sold to a 

;=i particular participant, amount of each product bought /sold off- 

iijs contract, etc. 

:S In order for participants in the food service industry to 

optimize efficiency in their operations- -for example, in the 
areas of marketing, distribution, and purchasing- -product data 
standardization system 10, according to an embodiment of the 

2 0 present invention, is provided. Product data standardization 

system 10 generally functions to receive the raw data generated 
by the diverse data sources and to generate standardized data for 
the products which are moved through the supply chains. In the 
standardized product data, like products are identified by the 

25 same identifier or description. Standardized product data is 

critical for streamlining the supply chains for products in the 
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food service industry. Once the standardized product data has 
been generated, the participants can access such data from system 
10 . 

To accomplish this, any of manufacturers 12, distributors 
5 14, and operators 16 may interact with product data 

standardization system 10 via the Internet 22. Internet 22 is an 
interconnection of computer '•'clients" and "servers" located 
throughout the world and exchanging information according to 
Transmission Control Protocol/internet Protocol (TCP/IP) , 
^0 Internetwork Packet eXchange/Sequence Packet exchange (IPX/SPX) , 
f AppleTalk, or other suitable protocol. Internet 22 supports the 
distributed application known as the "World Wide Web." Web 
servers maintain websites, each comprising one or more web pages 
at which information is made available for viewing. Each website 
\^%5 or web page can be identified by a respective uniform resource 
-V. locator (URL) and may be supported by documents formatted in any 
suitable language, such as, for example, hypertext markup 
language (HTML) , extended markup language (XML) , or standard 
generalized markup language (SGML) . Clients may locally execute 
20 a "web browser" or "web proxy" program. A web browser is a 

computer program that allows exchange information with the World 
Wide Web. Any of a variety of web browsers are available, such 
as NETSCAPE NAVIGATOR from Netscape Communications Corp., 
INTERNET EXPLORER from Microsoft Corporation, and others that 
25 allow convenient access and navigation of the Internet 22. 

Information may be communicated from a web server to a client 
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using a suitable protocol, such as, for example. Hypertext 
Transfer Protocol (HTTP) or File Transfer Protocol (FTP) . 

With the standardized product data generated by product data 
standardization system 10, as described herein, participants in 
5 the food service industry can better monitor, manage, control, 
consolidate, organize, or otherwise analyze the products which 
they manufacture, distribute, or use in the food service 
industry. 

Operators 16 can monitor purchasing patterns of their 

0 operating units 20, for example, to identify or detect off- 

contract buying. Once detected, measures can be taken to reduce 
or eliminate off-contract buying activity. This maximizes rebate 
capture and ensures product consistency from operating unit 2 0 to 
operating unit 20. Furthermore, operators 16 can improve the 

5 accuracy of their ordering and prevent purchasing errors that 
result in shortages, or alternatively, excess inventory. 
Distributors 14 can offer their customers the ability to review 
purchase histories and check the status of ordering on a 
corporate -wide basis. Manufacturers 12 can monitor the 

0 performance of their distributors 14 with reports that track 
deliveries, substitutions, shorts, etc., for individual 
distributors. Also, manufacturers 12 can identify the markets in 
which various products are most successful, and thus design or 
target promotions to advance further market penetration. 

5 Furthermore, manufacturers 12 can see how well their products 
sell relative to their competitors'' products. 

18 
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Although the present invention is generally applicable to 
any environment in which products are moved through various 
supply chains with each participant having its own identifier or 
description for the same product, the following description 
5 focuses on the food service industry in order to make the 
inventive concept more concrete. It should be understood, 
however, that such focus is not intended, nor should be 
construed, to limit the scope of the present invention. 

■'io Product Data Standardization System 

,Z: Fig. 2 illustrates a product data standardization system 10, 

I J according to an embodiment of the present invention. Product 

data standardization system 10 functions to generate standardized 
'{% product data from raw data generated by diverse data sources 
'ril5 during the movement of products across various supply chains, for 
example, in the food service industry. Product data 
standardization system 10 can be maintained by a participant in 
the supply chains (e.g., manufacturer 12, distributor 14, or 
operator 16) , or by any entity offering analytical services to 
20 one or more participants. As shown, product data standardization 
system 10 includes a data receiving component 30, an operational 
data store (ODS) database 32, a data analysis component 34, one 
or more analyst interfaces 36 (separately labeled 36a, 36b, and 
3 6c), and a data- warehouse 38. 
2 5 Data receiving component 3 0 receives raw data from one or 

more data sources. These data sources may include various 
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participants of the food service industry (e.g., manufacturers 
12, distributors 14, and operators 16) as well as their smaller 
units (e.g., distribution units 18 and operating units 20). In 
one embodiment, the raw data can be packaged at the respective 
5 data source in one or more files suitable for transfer, for 

example, using File Transfer Protocol (FTP) or Hypertext Transfer 
Protocol (HTTP) . 

The raw data may include information relating to products 
offered for sale or purchased by various participants, invoices 
|MlO documenting the sales/purchases, and accounts under which the 
relevant transactions are made. Product data can specify, for 
example, product identifiers or descriptions, quantities of sale 
(e.g., individually or by the case), prices for the products, 
ordering numbers, catalogs in which products are offered, etc. 
Invoice data can specify transaction level details including, for 
example, invoice number, invoice date, participants to the 
transactions, products which were ordered, quantity for each 
product, scheduled delivery date, actual delivery date, problems 
with an order (e.g., defective products), etc. Account data can 
2 0 relate to the organizational structure for one or more 

participants and may specify, for example, address and contact 
for main office or headquarters, address and contact for various 
units (e.g., operating units or distribution units) of each 
participant, preferred suppliers or purchasers for each 
25 participant, etc. 
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The raw data may comprise values for any number of data 
fields which are appropriate in the food service industry. These 
fields may include product name, product number, product 
identifier, manufacturer name, manufacturer number, manufacturer 
5 identifier, brand name, brand identifier, brand code, distributor 
name, distributor number, pack, pack size, etc. Each such field 
may have a particular value. For example, a product name field 
may have one of the following values: "ketchup," "milk," "eggs," 
"flour," etc. Likewise, a packing size field may have one of the 
following values: "1 pt," "4 qt, " "14 oz," "1 lb," "5 lbs," etc. 

Even though the raw data can have details for many products, 
transactions, and accounts, the files in which the raw data is 
received may be "flat files" in which there is no separation, 
division, or delineation as to what any element or piece of data 
represents. Exemplary raw data in flat file format is 
:S illustrated in Fig. 6. The data of the files may be compressed 
to facilitate transfer from the data sources to product data 
standardization system 10. 

Data receiving component 3 0 generally functions to receive 

2 0 and process the incoming data files. For each file, data 

receiving component 3 0 may process the raw data contained therein 
so that the data appears in a consistent format suitable for 
further processing. Data receiving component 3 0 operates on the 
raw data, for example, by removing unnecessary formatting and 

2 5 validating the data. Data receiving component 3 0 may attempt to 
match a raw description of an unidentified product against the 
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descriptions for various products known to product data 
standardization system 10. If there is a match, the relevant 
product has been identified and can be assigned a standardized 
product code, at least temporarily, until an audit is performed 
5 (as further described herein) . 

The functionality of data receiving component 3 0 can be 
performed by any suitable communications hub or router in 
combination with any one or more suitable processors, such as a 
main- frame, a file server, a workstation, or other suitable data 

'io processing facility supported by memory (either internal or 

Zif external) , running appropriate software, and operating under the 
control of any suitable operating system (OS) , such as MS-DOS, 
MacINTOSH OS, WINDOWS NT, WINDOWS 95, OS/2, UNIX, LINUX, XENIX, 

]% and the like. 
15 ODS database 32 is connected to data receiving component 30. 

;=J As used herein, the terms "connected," "coupled," or any variant 
thereof, means any connection or coupling, either direct or 
indirect, between two or more elements; such connection or 
coupling can be physical or logical. ODS database 32 generally 
2 0 functions to store the received data after it has been initially 
processed by data receiving component 30. ODS database 32 may 
also store standardized identifiers or descriptions for various 
products which are moved across supply chains and used in the 
food service industry. The standardized identifiers/descriptions 
25 can include standardized product codes for uniquely identifying 
the products. In an object-oriented implementation for product 
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data standardization system 10, a separate category may be 
provided for each standardized product code. A set of attributes 
may characterize each category. For example, attributes for a 
"cheese" category can be '"natural" or "processed." ODS database 
5 32 may also store information for the categories and respective 
sets of attributes. 

In addition, ODS database 32 may store and maintain data and 
information for a plurality of products and manufacturers which 
are "known" to product data standardization system 10. This 
io known product and manufacturer data can be used to identify 

products and manufacturers specified in incoming raw data. For 
each of a number known products, the information may specify, for 
example, a raw description for the known product, a universal 
^ product code (UPC) for the known product, values for various data 
As fields (e.g., product name, product number, manufacturer name, 
SI brand name, pack size, etc.) for the known product, and the like. 
For each of a number known manufacturers, the information may 
specify, for example, a raw description for the known 
manufacturer, values for various data fields (e.g., manufacturer 
20 name, manufacturer number, brand names, brand numbers, etc.) for 
the known manufacturer, and the like. This information can be 
received from the data sources, or alternatively, may be 
developed within product data standardization system 10 over 
time, for example, by a learning algorithm. 
25 ODS database 32 be implemented with any one or more suitable 

storage media, such as random access memory (RAM) , read-only 
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memory (ROM) , disk drives, tape storage, or other suitable 
volatile and/or non-volatile data storage facility. ODS database 
32 may be configured as a relational database. 

Data analysis component 34 generally functions to further 
5 process the data received from the diverse data sources. Among 
other things, data analysis component 34 may parse or separate 
the received data into the distinct field values. Data analysis 
component 34 may also attempt to identify a product specified in 
the raw data by comparing the field values for the product 
;iO against one or more predefined combinations of field values of 
■A known products. Data analysis component 34 may also generate one 
.■^ or more educated guesses as to the identity of a product . These 

guesses can be used in assigning a standardized product code to 
;F| the product. Data analysis component 34 also provides for the 
ifi.5 creation of new standardized product codes for any new products. 
1=1 Data analysis component 34 may provide for auditing of each 
assignment or creation of a standardized product code. 

The functionality of data analysis component 34 can be 
performed by any one or more suitable processors, such as a main- 
20 frame, a file server, a workstation, or other suitable data 
processing facility supported by memory (either internal or 
external) , running appropriate software, and operating under the 
control of any suitable operating system (OS), such as MS-DOS, 
MacINTOSH OS, WINDOWS NT, WINDOWS 95, OS/2, UNIX, LINUX, XENIX, 
25 and the like. Such processors can be the same or separate from 
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the processor performing the functionality for data receiving 
component 3 0 . 

Analyst interfaces 3 6 are in communication with data 
analysis component 34 and generally function to enable human 
5 analysts to interact with the same, for example, to review raw 
(or initially processed) data and guesses, and assist in the 
assignment and audit of standardized product codes. The 
functionality of each analyst interface 36 can be performed by 
one or more suitable input devices, such as a key pad, touch 

10 screen, input port, pointing device (e.g., mouse), microphone, 

and/or other device that can accept information, and one or more 
suitable output devices, such as a computer display, output port, 
speaker, or other device, for conveying information, including 
digital data, visual information, or audio information. In one 

15 embodiment, each analyst interface 36 may comprise or be operable 
to display at least one graphical user interface (GUI) having a 
number of interactive devices, such as buttons, windows, pull- 
down menus, and the like to facilitate the entry, viewing, and/or 
retrieval of information. 

2 0 Data warehouse 38, which is connected to data analysis 

component 34, generally functions to store and maintain the 
standardized product data output by data analysis component 34. 
Data warehouse 3 8 can be implemented with any one or more 
suitable storage media, such as random access memory (RAM) , read- 

2 5 only memory (ROM) , disk drives, tape storage, or other suitable 
volatile and/or non-volatile data storage facility. This data 
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Storage facility may be the same as or separate from the data 
storage facility implementing ODS database 32. 

From data warehouse 38, the standardized product data can be 
made available to various participants in the food service 
5 industry. Manufacturers 12, distributors 14, and operators 16 

may then access the standardized product data, for example, via a 
website maintained by the entity operating product data 
standardization system 10, and use the same for their own 
analyses of market trends, purchasing patterns, etc. 

0 

Data Receiving Component 

Fig. 3 is a block diagram for a data receiving component 30, 
according to an embodiment of the present invention. As 
depicted, data receiving component 3 0 includes a sender module 

5 50, a receiver module 52, an unpacker module 54, a transformer 
module 56, a cleanser module 58, a validator module 60, a 
signature matcher module 62, a loader module 64, an account and 
unit module 66, and a map segment module 68. Each of these 
modules 50 through 68 may comprise one or more programs which, 

0 when executed, perform the functionality described herein. 

Sender module 5 0 and receiver module 52 cooperate to support 
the transfer of data and information to and from product data 
standardization system 10. These modules may implement or 
support various protocols, such as, for example, File Transfer 

5 Protocol (FTP) or Hypertext Transfer Protocol (HTTP) . The 

data/inf ormation may include raw data generated by various data 
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sources and can be in the form of one or more files. Each such 
file of raw data may be compressed to facilitate transfer. 

Unpacker module 54 decompresses or "unpacks" the files of 
raw data which are received at sender/receiver modules. 
5 Transformer module 56 transforms the raw data, for example, by- 
applying various meta-data rules to make field breaks in the 
data. This places the received data in a consistent format for 
further processing. Cleanser module 58 "cleanses" the 
transformed data, for example, by removing extraneous formatting 
/io codes added during transfer or compression. Validator module 60 
2 validates the cleansed data to ensure that the items intended to 
1^ be represented are valid. Thus, cleanser module 58 assures the 

quality of each field in the product data, while validator module 
i?5 60 assures the quality of each product. 

iJ|L5 Signature matcher module 62 attempts a "signature match" for 

each product specified in the received data. Typically, in the 
raw data, each product is described with a raw description having 
textual information for the product, its manufacturer, package 
size, etc. This raw description may constitute a "signature" for 

2 0 the product. 

There may be different kinds of signatures, such as, for 
example, distribution center (DC) signatures, DC product 
signatures, account signatures, and transaction signatures. Each 
kind of signature may comprise various fields. A DC signature 

2 5 may have a field for a DC number code (dc_num) . A DC product may 
have fields for DC product number (dc_prod_num) , DC product name 
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{ dc_pr od_name ) , DC pact quantity (d.c_pact_qty) , DC package size 
{dc_pack_size) , DC brand (dc_brand) , DC vendor number 
{dc_vendor_num) , DC vendor name (dc_vendor_name) , DC sell by unit 
of measure (dc_sb_uom) , and DC price by unit of measure 
5 (dc_pb_uom) , An account signature may have fields for DC number 
code (dc_num) and account number code (account_num) . A 
transaction signature may have a field for an identifier of a DC 
signature (dc_sig_id) and an identifier of a DC product signature 
(dc_prod_sig_id) . For each signature, product data 
'^10 standardization system 10 may assign a numeric identifier 

(dc_sig_id, dc_prod_sig_id, acct_sig_id, trx_sig_id) . 
^-^ Thus, in one embodiment for the signature matching process, 

signature matcher module 62 receives a raw description for a 
product and, in response, outputs numeric identifiers (e.g., 
^'15 dc_sig_id, dc_prod_sig_id, acct_sig_id, trx_sig_id) for 
fj the different kinds of signatures (e.g., DC signature, DC product 
signature, account signature, and transaction signature) 
contained therein. 

Two exemplary signatures are shown in the following table. 

20 



Field name 


Example one 


Example two 


KTS ID 


500562000 


500644053 


DC product number 


0185736 


9817611 


DC product name 


Raisins seedless 
dark select 


Oil, peanut 


DC pack quantity 


024 


00001 
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DC pack size 


15 OZ 


35 LB 


DC brand 


PACKER 


BUNGE EDIBLE OIL 


DC vendor number 


(blank) 


7695/10421 


DC vendor name 


(blank) 


BUNGE EDIBLE OIL 


DC UPC 


(blank) 


(blank) 


DC SELL BY UOM 


CS 


CS 


DC Price By UOM 


(blank) 


LB 


DC Name 


US FOODSERVICE - 
ATLANTA 


SHAMROCK FOODS 



Product data standardization system 10 may store information for 
product signatures that it has previously seen in other raw data 

'} or received from a participant in some other way. Signature 
5 matcher module 62 compares the signature for each product 

specified in the raw data against the stored signatures for known 
products. If there is an exact match for a received product 
signature, signature matcher module 62 will assign an appropriate 
standardized product code for the product under consideration. 

10 At a later point, the assigned standardized product code may be 
audited for accuracy. If there is not an exact match for a 
received product signature, no standardized product code is 
assigned to the product at the current time. 

Loader module 64 generally functions to load the received 

15 data into ODS database 32 . Account and unit module 66 generally 
functions to support the establishment or updating of one or more 
accounts. These accounts are provided for participants which 
would like to have access to the standardized product data 
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generated by product data standardization system 10. Each 
account for a participant may be divided into sub-accounts for 
the units (e.g., operating units or distribution units) of that 
participant. Map segment module 68 generally functions to 
5 support the entry or updating of information relating to the 
organizational structure of various participants. This 
information, for example, may "map" or outline the various 
operating units 20 of a operator 16 or the various distribution 
units 18 of a distributor 14. 

Data Analysis Component 

Fig. 4 is a block diagram for a data analysis component 34, 
according to an embodiment of the present invention. As 
depicted, data analysis component 34 includes a combination 
j]15 matcher module 70, a guesser module 72, a manufacturer assigner 
S module 74, a manufacturer auditor module 76, a standardized 

product code (SPC) assigner module 78, a SPC auditor module 80, a 
SPC creator module 82, and a SPC creation auditor module 84. 
Each of these modules 70 through' 84 may comprise one or more 
20 programs which, when executed, perform the functionality 
described herein. 

Combination matcher module 70 generally functions to match 
various fields of the received data for an unidentified product 
against one or more predetermined combinations of fields. This 
25 is done to identify the particular product or a manufacturer of 
the same. Specifically, a given product or manufacturer may be 
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uniquely identified by the values it is assigned for certain 
fields . 

If all field values for an unidentified product specified in 
the received data match a particular combination of field values, 
5 a standardized product code can be assigned to the product. For 
a product, the following combinations of fields may be used for 
matching : 

^ (1) Distributor_id, product_no, product_name , brand_code, 

"~:10 pack, pack_size; 

(2) Distributor_id, mf c_product_no , product_name , 
brand_code, pack, pack_size; and 

;Jl5 (3) Product_view_id, product_no, product_name , brand_code, 

sell_by_uom. 

Likewise, the manufacturer for a product specified in the 
]{= received data can be identified if all field values for the 
;520 product match a particular combination of field values. For a 

manufacturer, the following combinations of fields may be used 

for matching: 



(1) 


Brand_ 


code , 


product view id, raw mfc_name 


(2) 


Brand 


code , 


raw mfc_name, distributor_id; 


(3) 


Brand 


code , 


raw_mf c_name ; 


(4) 


Brand 


code , 


product_view_id; and 


(5) 


Brand 


code , 


distributor_id . 



Guesser module 72 generally functions to generate one or 
35 more guesses as to the identity of a product specified in the 
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received data. Guesser module 72 may output one or more 
standardized product descriptions and/or codes for each product 
it considers. For each standardized product description or code, 
guesser module 72 may also output a respective confidence measure 
5 as to a match between the product under consideration and the 
standardized description/code . The confidence measure can be a 
normalized value (i.e., between zero and one) that is 
monotonically related to similarity. In one embodiment, the 
confidence measure can be a percentage value (e.g., 100%, 85%, 

■~10 2 0%, etc.) which represents a measure of confidence as to the 
^: certainty of the match. The description/codes and respective 

confidence measures generated by guesser module 72 can be used in 
assigning a standardized product code to each product, and then 
auditing the assignment. In one embodiment, the 

•"]15 description/codes and confidence measures can be presented to one 

;S| or more analysts for consideration and review. 

Guesser module 72 may perform a pattern match to determine 
the similarity of a product specified in the raw data against one 
or more known products. Raw data with sufficiently high 
2 0 similarity with a known product may be automatically assigned a 
standardized product code, thereby eliminating the need for 
manual intervention. In one embodiment, the fields considered by 
guesser module 72 in performing the pattern match include 
manufacturer, product_descript ion, brand_code, sell_by_uom, pack, 
25 pack_size, and pricing_uom. Guesser module 72 finds a 

predetermined number (e.g., twenty-five) of the most similar 



32 



M-8603 US 

products, and gives the respective standardized product codes as 
guesses. Furthermore, guesser module 72 may identify new 
signatures or raw descriptions which can be mapped to respective 
standardized product codes for use in the future by signature 
5 matcher module 62 of data receiving component 30. 

Manufacturer assigner module 74 generally functions to 
assign an appropriate standardized manufacturer code to various 
products specified in the received data. To accomplish this, 
..^ manufacturer assigner module 74 may use the information (e.g., a 
1^10 list of possible manufacturers) generated by guesser module 72. 

In one embodiment, for each unidentified product being 
; considered, manufacturer assigner module 74 presents information, 

such as the list of possible manufacturers, to an analyst for 
; assistance in making an assignment. Manufacturer auditor module 

J5 76 generally functions to audit each assignment of a manufacturer 
code. In one embodiment, manufacturer auditor module 76 presents 
information, which may include the assigned manufacturer code, to 
an analyst for assistance in the audit. This analyst can be the 
same as or different from the analyst assisting with the 
20 assignment. An exemplary screen display for manufacturer 

assignment and audit, according to an embodiment of the present 
invention, is illustrated in Fig. 7. 

SPC assigner module 78 generally functions to assign a 
standardized product code to various products specified in the 
25 received data. To accomplish this, SPC assigner module 78 may 
use the information (e.g., one or more guesses of possible 
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standardized product codes) generated by guesser module 72. In 
one embodiment, for each product being considered, SPC assigner 
module 78 presents information, such as the list of guesses of 
possible standardized product codes, to an analyst for assistance 
5 in making the assignment. SPC auditor module 80 generally 
functions to audit each assignment of a standardized product 
code. In one embodiment, SPC auditor module 8 0 presents 
information, which may include the assigned standardized product 
code, to an analyst for assistance in the audit. The auditing 

"^;10 analyst can be the same as or different from the analyst 

assisting in making the assignment. An exemplary screen display 
for standardized product code assignment and audit, according to 
an embodiment of the present invention, is illustrated in Fig. 8. 

1^ SPC creator module 82 generally functions to create a new 

L5 standardized product code, for example, in the event that no 

existing standardized product code is appropriate for a product 
specified in the received data. In one embodiment, for each 
standardized product code being created, SPC creator module 82 
presents information, such as a suggested standardized product 
2 0 code, to an analyst for assistance in creating a new code. SPC 
creation auditor module 84 generally functions to audit each 
creation of a new standardized product code. In one embodiment, 
SPC creation auditor module 84 presents information, which may 
include the newly created standardized product code, to an 
25 analyst for assisting in the audit. This analyst can be the same 
as or different from the analyst assisting in the creation of the 
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new code. An exemplary screen display for standardized product 
code creation, according to an embodiment of the present 
invention, is illustrated in Fig. 9. 

5 Hardware Implementation 

Fig. 5 illustrates a computer-based system 90 that is an 
exemplary hardware implementation for product data 
standardization system 10. In general, computer-based system 90 
may include, among other things, a number of processing 
]%10 facilities, storage facilities, data servers, and workstations. 
.Jj As depicted, the processing facilities may include process 

■Jj servers 91 and 95, file servers 93 and 96, a data server 97 and 

workgroup servers 94 and 98. In one embodiment, process servers 
91 and 95 can be implemented with servers commercially available 
\^}15 from Sun Microsystems. File servers 93 and 95 can be implemented 
^5 with any suitable storage solution, such as, for example, those 
commercially available from EMC, Auspex Systems, or Network 
Appliance. Workgroup servers 94 and 98 can be implemented with 
servers commercially from Dell Computer Corporation or Compaq 
20 Computers. Each of process servers 91 and 95, file servers 93 
and 96, and workgroup servers 94 and 98 can run any suitable 
operating system, such as, for example, SUN SOLARIS 5.6 from Sun 
Microsystems or WINDOWS NT from Microsoft Corporation. 

Process servers 91 and 95, file servers 93 and 96, and data 
2 5 server 97 may provide the primary processing capability required 
to implement the functionality of data receiving component 30 of 
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product data standardization system 10. This includes the 
functionality of sender module 50, receiver module 52, unpacker 
module 54, transformer module 56, cleanser module 58, validator 
module 60, signature matcher module 62, loader module 64, account 
5 and unit module 66, and map segment module 68. In one 

embodiment, each of modules 5 0 through 68 can be implemented, at 
least in part, as one or more programs running on process servers 
91 and 95, file servers 93 and 96, and data server 97, with each 
module being initiated when its functionality is required, as 

^ylO described herein. 

Workgroup server 94 and process server 95 may provide the 

[JJ primary processing capability required to implement the 

functionality of data analysis component 34 of product data 

:=:f standardization system 10. This includes the functionality of 

i^?3.5 combination matcher module 70, guesser module 72, manufacturer 

assigner module 74, manufacturer auditor module 76, SPC assigner 
module 78, SPC auditor module 80, SPC creator module 82, and SPC 
creation auditor module 84. In one embodiment, each of modules 
70 through 84 can be implemented, at least in part, as one or 
20 more programs running on workgroup server 94 and process server 
95, with each module being initiated when its functionality is 
required, as described herein. 

The storage facilities of computer-based system 90 may 
include data server 97 and file servers 93 and 96. In one 
25 embodiment, data server 97 can be implemented with SIM Server 
Class equipment commercially available from Sun Microsystems. 
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Data server 97 can run a SOLARIS operating system. Furthermore, 
data 97 can run any suitable database application, such as an 
ORACLE database. Data servers comprise or support associated 
memories, which can include any one or a combination of suitable 
5 storage media, such as random access memory (RAM) , read-only 
memory (ROM) , disk, tape storage, or other suitable volatile 
and/or non-volatile data storage media. 

Data server 97 and file servers 93 and 96 may provide the 
primary storage capability required to implement the 
;S3lO functionality of ODS database 32 and data warehouse 3 8 of product 
y data standardization system 10 . The associated memories of data 

^Al server 97 and file servers 93 and 96 receive, store, and forward 

the various data and information input into and generated within 
:=J product data standardization system 10. Thus, for example, the 
;,j115 associated memories may store raw data, standardized identifiers/ 
'zf descriptions, known product data, and standardized product data. 

A plurality of workstations 99 (separately labeled 99a, 99b, 
and 99c) are connected to workgroup server 94. Each workstation 
99 can be a computer having one or more suitable input devices 
2 0 (e.g., a keypad, touch screen, mouse, etc.) and output devices 
(e.g. a video monitor, audio speaker, etc.) for communicating 
data/information associated with the operation of product data 
standardization system 10, including digital data, visual 
information, or audio information. Each workstation 99 may 
2 5 include fixed or removable storage media, such as magnetic 

computer disc, optical disc, CD-ROM, or other suitable media to 
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both receive output from and provide input to product data 
standardization system 10. Workstations 99 may provide the 
primary interface capability required to implement the 
functionality of analyst interfaces 36 of product data 
5 standardization system 10. 

Screen Display For Manufacturer Assignment and Audit 

Fig. 7 illustrates an exemplary screen display 100 for 
manufacturer assignment and audit, according to an embodiment of 
^;flO the present invention. Screen display 10 0 may be generated by 

one or both of manufacturer assigner and auditor modules 74 and 
j! 76 of data analysis component 34. 

In screen display 100, one or more entries 102 may be 
presented to an analyst in order to identify and audit respective 
{15 manufacturers under consideration. Each entry 102 may comprise 

values in various fields 104. As shown, these fields 104 include 
manufacturer (MFC) , raw manufacturer name (RAW MFC NAME) , brand 
(BRAND) , product identifier (PRODUCT ID) , product name (PRODUCT 
NAME), distributor (DISTRI), pack (PACK), and pack size (PACK 
20 SIZE) . 

A number of possible matches 106 are also presented to the 
analyst. These matches 106 can be generated by guesser module 72 
of data analysis component 34. Each possible match 106 may 
comprise values in various fields 108. As shown, these fields 
25 108 include manufacturer identifier (MFC ID) , brand (BRAND) , 
brand identifier (BRAND ID) , brand name (BRAND NAME) , 
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distribution center identifier (DC ID) , distributor (DIST) , 
manufacturer name (MFC NAME), etc. 

Screen Display For SPC Assignment and Audit 
5 Fig. 8 illustrates an exemplary screen display 110 for 

standardized product code assignment and audit, according to an 
embodiment of the present invention. Screen display 110 may be 
generated by one or both of SPC assigner and auditor modules 7 8 
and 80 of data analysis component 34. 
[^ilO In screen display 110, one or more entries 112 may be 

^^f presented to an analyst in order to identify and audit respective 
'j! products under consideration. Each entry 112 may comprise values 
in various fields 114. As shown, these fields 114 include 
standardized product code (SPC or IPC) , standardized product code 
• =f;i5 name (IPC NAME) , manufacturer name (MFC NAME) , manufacturer 
number (MFC NO) , product name (PRODUCT NAME) , product number 
(PRODUCT NO) , etc. 

A number of possible matches 116 are also presented to the 
analyst. These matches 116 can be generated by guesser module 72 
20 of data analysis component 34. Each possible match 116 may 

comprise values in various fields 118 . As shown, these fields 
118 include standardized product code number (IPC NO) , 
standardized product code name (IPC NAME) , manufacturer name (MFC 
NAME) , brand name (BRAND NAME) , manufacturer number (MFC NO) , 
25 product name (PRODUCT NAME), product number (PRODUCT NO), etc. 
Also provided are a number of scores which can reflect the 
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measure of confidence that a particular possible match 116 is a 
match for a product under consideration. 

Screen Display For SPC Creation 
5 Fig. 9 illustrates an exemplary screen display 12 0 for 

standardized product code creation, according to an embodiment of 
the present invention. Screen display 120 may be generated by 
one or both of SPC creator and creation auditor modules 82 and 84 
of data analysis component 34. 

:5lO In screen display 120, one or more entries 122 may be 

presented to an analyst in order to create and audit the creation 
of respective new standardized product codes. Each entry 122 may 
comprise values in various fields 124. As shown, these fields 
124 include product name (PRODUCT NAME) , pack (PACK) , and pack 

j||15 size (PK SIZE) , raw brand description (RAW BRAND) , brand name 

:5 (BRAND NAME), etc. 

Method For Standardizing Product Data 

Fig. 10 is a flow diagram of an exemplary method 150 for 

2 0 standardizing product data, according to an embodiment of the 
present invention. Method 15 0, which may correspond to the 
operation of product data standardization system 10, can be 
performed for each product specified in raw data received from 
one of a number of diverse data sources . 

25 Method 150 begins at step 152 where product data 

standardization system 10 receives the raw data for a product at 
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data receiving component 30. At step 154, data receiving 
component 3 0 formats the raw data into a form that is suitable 
for further processing. This may include unpacking, 
transforming, cleansing, and validating the raw data, as 
5 performed by unpacker module 54, transformer module 56, cleanser 
module 60, and validator module 62, respectively. The received 
data may include information relating to a number of products 
moving through one or more supply chains . 

At step 156, signature matcher module 62 compares a raw 

;^10 description or "signature" for the product against various 

signatures previously seen by system 10. At step 158, signature 
matcher module 62 determines whether there is any match for the 
signature of the product under consideration. If there is a 
match, method 150 moves to step 186 (as described below) . 

15 Alternatively, if there is no match for the signature, the data 
for the product is forwarded (via ODS database 32) to data 
analysis component 34 . 

At step 160, combination matcher module 70 compares various 
fields for the product against predetermined combinations of 

20 fields in another attempt to match the product. At step 162, 

combination matcher module 7 0 determines whether the field value 
for the product under consideration match any of the 
predetermined combinations. If there is a match, then method 150 
moves to step 174 where data analysis component 34 assigns the 

2 5 standardized product code for that combination to the product, 

after which the assigned standardized product code is audited at 
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Step 176 (as described below) . Otherwise, if it is determined at 
step 162 that there is no match, then at step 166 manufacturer 
assigner module 74 assigns a manufacturer to the product. In one 
embodiment, manufacturer assigner module 74 may display at least 
5 a portion of the received data to an analyst for assistance in 
making the assignment. After a manufacturer has been assigned, 
manufacturer auditor module 76 audits the assignment at step 168. 
In one embodiment, manufacturer auditor module 76 may display the 
assigned manufacturer to the same or a different analyst for 
yiO assistance in the audit. This serves as a check on the 
manufacturer assignment. 

At step 170, data analysis component 34 determines whether 
the manufacturer assigned to the product is new to system 10. If 
the manufacturer is new, then method 150 moves to step 180 (as 

15 described below) . Alternatively, if the manufacturer is not new, 
then at step 172 guesser module 72 generates one or more guesses 
as to the identity of the product. To accomplish this, guesser 
module 72 may consider the product line of the assigned 
manufacturer. In one embodiment, guesser module 72 may output up 

20 to a predetermined number (e.g., twenty) of guesses for the 
product. Along with each guess, guesser module 72 may also 
generate a confidence measure (expressed as a percentage value) 
as to the level of confidence that the guess is correct. A more 
detailed description of the operation of guesser module 72 is 

2 5 provided below. 
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At step 174, SPC assigner module 78 assigns a standardized 
product code to the product, for example, using the guesses and 
respective confidence measures generated by guesser module 72 . In 
one embodiment, SPC assigner module 78 may display the guesses 
5 and respective confidence measures to an analyst for assistance 
in making the assignment of a standardized product code. After a 
standardized product code has been assigned, SPC auditor module 
80 audits the assignment at step 176. In one embodiment, SPC 
auditor module 80 may display the assigned standardized product 
;=10 code to the same or a different analyst for assisting in the 
2f audit. This serves as a check on the manufacturer assignment. 

At step 178, data analysis component 34 determines whether a 
new standardized product code is required. If a new standardized 
product code is not required, method 150 moves to step 186 where 
]{p-5 the assignment of a standardized product code for the product 
under consideration is finalized, after which method 150 ends. 
Otherwise, if a new standardized product code is required, then 
method 150 moves to step 184 (as described below) . 

Returning again to step 170, if the manufacturer assigned to 
2 0 the product is new to the system, then a new standardized product 
code should be created. At steps 180 and 182, SPC creator module 
82, using various information for the product under 
consideration, assigns a brand identifier and packing standard 
for the new standardized product code. In one embodiment, SPC 
25 creator module 82 may display various information for the product 
under consideration to an analyst for assistance in making the 
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assignment of a brand identifier and packing standard. After a 
new standardized product code has been created, then at step 184 
SPC creation auditor module 84 audits the newly created 
standardized product code. In one embodiment, SPC creation 
auditor module 84 displays the newly created standardized product 
code to the same or a different analyst for assistance in the 
auditing. This serves as a check on the creation of a new 
standardized product code. At step 186, the assignment of a 
standardized product code for the product under consideration is 
finalized, after which method 150 ends. 

Method For Signature Match 

Fig. 11 is a flow diagram of an exemplary method 350 for 
performing a signature match, according to an embodiment of the 
present invention. In one embodiment, method 3 50 may correspond 
to the operation for signature matcher module 62 of data 
receiving component 30 and can be performed for an unidentified 
raw description. 

Method 350 begins at step 3 52 where signature matcher module 
62 receives a raw description for a product. This raw 
description may include a number of different fields for various 
signatures, such as, a distribution center (DC) signature, a DC 
product signature, an account signature, and a transaction 
signature. These fields include a DC number code {dc_num) , DC 
product number (dc_prod_num) , DC product name (dc_prod_name) , DC 
pact quantity (dc_pact_qty) , DC package size (dc_pack_size) , DC 
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brand (dc_brand) , DC vendor number (dc_vendor_num) , DC vendor 
name (dc_vendor_name) , DC sell by unit of measure (dc_sb_uom) , 
and DC price by unit of measure (dc_pb_uom) , account number code 
(account_num) . 

5 At step 354, signature matcher module 62 compares a DC 

number code from the raw description against DC number codes 
previously stored in product data standardization system 10. At 
step 356, signature matcher module 62 determines whether any of 
the stored DC number codes match the DC number code under 

10 consideration. If there is a match, then at step 358 signature 

matcher module 62 retrieves an associated DC signature identifier 
(dc_sig_id) which is assigned to the stored DC number code; this 
retrieved DC signature identifier is then used as the DC 
if; signature identifier for the DC number code under consideration. 

L5 Otherwise, if there is no match, then at step 3 60 signature 
- matcher module 62 assigns a new DC signature identifier to the DC 

number code under consideration. 

At step 362, signature matcher module 62 compares a DC 
product number, a DC product name, a DC pact quantity, a DC 

2 0 package size, a DC brand, a DC vendor number, a DC vendor name, a 
DC sell by unit of measure, and a DC price by unit of measure for 
a DC product signature in the raw description against like fields 
previously stored in product data standardization system 10. At 
step 364, signature matcher module 62 determines whether there is 

25 a match of the previously stored fields and the fields for the DC 
product signature under consideration. If there is a match, then 
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at step 366 signature matcher module 62 retrieves an associated 
DC product signature identifier (dc_prod_sig_id) which is 
assigned to the stored fields; this retrieved DC product 
signature identifier is then used as the DC product signature 
5 identifier for the DC product signature under consideration. 
Otherwise, if there is no match, then at step 368 signature 
matcher module 62 assigns a new DC product signature identifier 
to the DC product signature under consideration. 

At step 370, signature matcher module 62 compares a DC 

'IflO number code and an account number code for an account signature 
in the raw description against like fields previously stored in 
product data standardization system 10. At step 372, signature 
matcher module 62 determines whether there is a match of the 
previously stored fields and the fields for the account signature 

■■=!'Jl5 under consideration. If there is a match, then at step 374 
signature matcher module 62 retrieves an associated account 
signature identifier (acct_sig_id) which is assigned to the 
stored fields; this retrieved account signature identifier is 
then used as the account signature identifier for the account 
2 0 signature under consideration. Otherwise, if there is no match, 
then at step 376 signature matcher module 62 assigns a new 
account signature identifier to the account signature under 
consideration . 

At step 378, signature matcher module 62 determines whether 
25 the DC signature identifier and DC product signature identifier 
assigned for the present raw description already exist within 
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product data standardization system 10. If these already exist, 
then at step 3 80 signature matcher module 62 retrieves an 
associated transaction signature identifier (trx_sig_id) which is 
assigned for DC signature identifier and DC product signature 
5 identifier. Otherwise, if the DC signature identifier and DC 
product signature do not already exist, then at step 3 82 
signature matcher module 62 assigns a new transaction signature 
identifier to then raw description under consideration. 
Afterwards, method 350 ends. 

,Z? Method For Combination Matching For a Product 

Fig. 12 is a flow diagram of an exemplary method 2 00 for 
matching a combination of fields for a product, according to an 
i'Ji embodiment of the present invention. In one embodiment, method 
iJ'iS 200 may correspond to one aspect of operation for combination 
matcher module 70 of data analysis component 34 and can be 
performed for an unidentified product specified in raw data. 

Method 2 00 begins at step 2 02 where combination matcher 
module 70 compares values in various fields for the product under 
20 consideration against multiple sets of values for the following 
combination of fields: distributor identifier, product number, 
product name, brand code, pack, and pack size. At step 2 04, for 
all of these fields, combination matcher module 70 determines 
whether the values of any set exactly match the values of the 
2 5 product under consideration. If so, then method 2 00 moves to 
step 216 where combination matcher module 70 generates an 
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indicator that there is a "match" for the product under 
consideration, after which method 200 ends. 

Alternatively, if at step 204 it is determined that none of 
the sets exactly match the product under consideration in that 
5 particular combination of fields, then at step 206 combination 
matcher module 70 compares values for the product under 
consideration against multiple sets of values for the following 
combination of fields: distributor identifier, manufacturer 
product number, product name, brand code, pack, and pack size. 
^;:flO At step 208, for all of these fields, combination matcher module 
,:=;f 7 0 determines whether the values of any set exactly match the 

■J values of the product under consideration. If so, then method 
200 moves to step 216 where combination matcher module 70 
generates an indicator that there is a "match" for the product 
L5 under consideration, after which method 200 ends. 

Otherwise, if at step 208 it is determined that none of the 
sets exactly match the product under consideration in that 
particular combination of fields, then at step 210 combination 
matcher module 7 0 compares values for the product under 
2 0 consideration against multiple sets of values for the following 
combination of fields: product view identifier, product number, 
product name, brand code, and sell by unit of measure (UOM) . At 
step 212, for all of these fields, combination matcher module 70 
determines whether the values of any set exactly match the values 
25 of the product under consideration. If so, then method 200 moves 
to step 216 where combination matcher module 7 0 generates an 
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indicator that there is a "match" for the product under 
consideration, after which method 200 ends. 

Alternatively, if at step 212 it is determined that none of 
the sets exactly match the product under consideration in that 
5 particular combination of fields, then at step 214 combination 
matcher module 70 generates an indicator that there is a "no 
match" for the product under consideration. Method 200 ends. 

Method For Combination Matching For a Manufacturer 
:;:10 Fig. 13 is a flow diagram of an exemplary method 250 for 

matching a combination of fields for a manufacturer, according to 
an embodiment of the present invention. In one embodiment, 
method 25 0 may correspond to one aspect of operation for 
combination matcher module 7 0 of data analysis component 34 and 
L5 can be performed for an unidentified manufacturer specified in 
5 raw data . 

Method 250 begins at step 252 where combination matcher 
module 70 compares values in various fields for the manufacturer 
under consideration against multiple sets of values for the 
2 0 following combination of fields: brand code, product view 

identifier, and raw manufacturer name. At step 254, for all of 
these fields, combination matcher module 7 0 determines whether 
the values of any set exactly match the values of the 
manufacturer under consideration. If so, then method 250 moves 
25 to step 272 where combination matcher module 70 generates an 
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indicator that there is a ^"match" for the manufacturer under 
consideration, after which method 250 ends. 

Alternatively, if at step 254 it is determined that none of 
the sets exactly match the manufacturer under consideration in 
5 that particular combination of fields, then at step 206 
combination matcher module 7 0 compares values for the 
manufacturer under consideration against multiple sets of values 
for the following combination of fields: brand code, raw 
manufacturer name, and distributor identifier. At step 2 58, for 
l^yiO all of these fields, combination matcher module 70 determines 
whether the values of any set exactly match the values of the 
manufacturer under consideration. If so, then method 250 moves 
to step 272 where combination matcher module 70 generates an 
indicator that there is a "match" for the manufacturer under 
='15 consideration, after which method 250 ends. 

1=; Otherwise, if at step 2 58 it is determined that none of the 

sets exactly match the manufacturer under consideration in that 
particular combination of fields, then at step 260 combination 
matcher module 70 compares values for the manufacturer under 

2 0 consideration against multiple sets of values for the following 
combination of fields: brand code and raw manufacturer name. At 
step 262, for all of these fields, combination matcher module 70 
determines whether the values of any set exactly match the values 
of the manufacturer under consideration. If so, then method 250 

25 moves to step 272 where combination matcher module 70 generates 



M-8603 US 

an indicator that there is a "match" for the manufacturer under 
consideration, after which method 250 ends. 

On the other hand, if at step 262 it is determined that none 
of the sets exactly match the manufacturer under consideration in 
5 that particular combination of fields, then at step 264 
combination matcher module 70 compares values for the 
manufacturer under consideration against multiple sets of values 
for the following combination of fields: brand code and product 
view identifier. At step 266, for all of these fields, 

i^lO combination matcher module 70 determines whether the values of 
any set exactly match the values of the manufacturer under 

■jj consideration. If so, then method 250 moves to step 272 where 
combination matcher module 70 generates an indicator that there 
is a "match" for the manufacturer under consideration, after 

[A-S which method 250 ends. 

Otherwise, if at step 266 it is determined that none of the 
sets exactly match the manufacturer under consideration in that 
particular combination of fields, then at step 268 combination 
matcher module 7 0 compares values for the manufacturer under 
20 consideration against multiple sets of values for the following 

combination of fields: brand code and distributor identifier. At 
step 270, for all of these fields, combination matcher module 70 
determines whether the values of any set exactly match the values 
of the manufacturer under consideration. If so, then method 250 
2 5 moves to step 2 72 where combination matcher module 7 0 generates 
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an indicator that there is a "match" for the manufacturer under 
consideration, after which method 250 ends. 

Alternatively, if at step 270 it is determined that none of 
the sets exactly match the manufacturer under consideration in 
5 that particular combination of fields, then at step 274 

combination matcher module 70 generates an indicator that there 
is a "no match" for the manufacturer under consideration. Method 
2 50 ends. 

10 Method For Generating Guesses 
.2' Fig. 14 is a flow diagram of an exemplary method 3 00 for 

generating a guess as to the identity of a product, according to 
an embodiment of the present invention. In one embodiment, 
method 3 00 may correspond to the operation of guesser module 72 

15 of data analysis component 34 and can be performed for a product 
specified in raw data. 

Method 3 00 begins at step 3 02 where guesser module 72 
performs a pattern match for the product under consideration. 
The pattern match is a threshold-based pattern comparison of 

20 various fields. In a pattern match, for a number of fields, the 
values for a product under consideration are compared against the 
values of various known products to determine the similarity 
therebetween. A separate fraction may be assigned to indicate 
the similarity of values for each field. For at least some of 

25 the known products, guesser module 72 may generate a measure of 
confidence which indicates the overall similarity of the product 
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under consideration against a particular known product. The 
measure of confidence can be a normalized value (between 0% and 
100%) that is monotonically related to similarity. A method for 
performing a pattern match is described below in more detail . 
5 The pattern match may yield a number of "matches" for the 

product under consideration. A match is defined as a known 
product with a confidence measure that lies above a predetermined 
threshold (e.g., 80%). A match may be considered to be a "unique 
match" if one of the following two conditions are met: (a) the 
-[10 confidence measure for the match is the highest possible value 

(e.g., 100%) or (b) the confidence measure for the match is above 
a threshold (e.g., 90%) higher than the threshold for a simple 
match, and the next best match produced by the pattern match has 
",■= a confidence measure which is significantly lower. The higher 
;^15 threshold for a unique match, which is still below the highest 
TZ possible value, recognizes that the data for a product under 

consideration may be slightly "corrupted" due to random errors 
(e.g., typos, scanning errors, etc.) . The higher threshold for 
unique matches treats such random errors as inconsequential. 
20 At step 304, in light of the results of the pattern match, 

guesser module 72 determines whether there is a unique match for 
the product under consideration. If a known product is a unique 
match for the product under consideration, a standardized product 
code for that known product can be assigned to the product under 
25 consideration and method 300 ends. Alternatively, if no known 
product is a unique match for the product under consideration, 
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then at step 306 guesser module 72 determines whether there is a 
simple match for the product under consideration. A simple match 
signifies a sufficiently high correlation between the product 
under consideration and a known product, but the differences are 

5 greater than those expected from point -wise or random errors 
{e.g., typographical errors) . Such differences may be 
attributable to, for example, abbreviations (e.g., "breaded chx 
breast" for ^^breaded chicken breast") . 

Accordingly, if it is determined at step 306 that there is a 

0 match for the product under consideration, method 3 00 moves to 
step 310 where the logic of product data standardization system 
10 is updated to reflect the match. For example, a "match" 
between the product under consideration and a known product is 
sufficient for a "signature match" of the two products. Thus, 

5 the logic of signature matcher module 62 in data receiving 

component 3 0 can be updated accordingly. In the future, data 
analysis component 34 will then be able to immediately assign a 
standardized product code to any product having the same raw 
description or "signature" as the product currently under 

0 consideration. Afterwards, method 3 00 ends. 

Otherwise, if it is determined at step 3 06 that there is no 
match for the product under consideration, then there are 
substantial differences, or uniqueness was not satisfied, between 
the received data for that product and the data for any known 

5 products. At step 3 08, using the guesses and corresponding 

confidence measures for the product under consideration, product 
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data standardization system 10 assigns a standardized product 
code to the product. In one embodiment, guesser module 72 may 
display various information to an analyst to assist in the 
assignment. After a standardized product code has been assigned 
5 to the product under consideration, the logic of data analysis 
component 34 is updated to reflect the assignment at step 310. 
Thus, in the future, data analysis component 34 will be able to 
immediately assign a standardized product code to any product 
having the same data as the product currently under 
-10 consideration. Method 3 00 then ends. 

Method For Pattern Match 

Fig. 15 is a flow diagram of an exemplary method 400 for 
performing a pattern match, according to an embodiment of the 
j='5l5 present invention. In one embodiment, method 400 may correspond 
to one aspect of the operation of guesser module 72 of data 
analysis component 34. Method 400 considers a product specified 
in raw data received by product data standardization system 10. 
As described herein, this raw data may be parsed into a number of 

20 distinct fields (e.g., product name, product number, manufacturer 
name, manufacturer number, brand code, distributor name, 
distributor number, packing size, etc.) with a similarity value 
computed for each field. Method 4 00 implements a sequential 
comparison of each field and computes a single, composite value 

25 for the set of fields. This single composite value is an overall 
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measure of similarity of the raw data with a particular known 
product . 

Method 400 begins at step 402 where a field is selected. At 
step 4 04, for this field, guesser module 72 determines the 
5 similarity between a data field of the product under 

consideration against the analogous field of the known product. 
In one embodiment, this is accomplished by a pattern matching 
routine that computes the fractional similarity between the two 
fields. In one embodiment, this fraction may have a numerical 

10 value between zero and one, with a higher numerical value 
generally indicating more similarity. 

At step 408, guesser module 72 determines whether the 
numerical value of the fraction meets a minimum threshold for the 
relevant field. If the minimum threshold is not met, method 400 

15 ends. The use of a minimum threshold for each field recognizes 
that if there is not sufficient similarity between the values of 
the product under consideration and the known product, then there 
is no reason to proceed further. For example, if the numerical 
value for a product name field is a relatively low, it is very 

20 likely that the product under consideration is not the same as 
the known product; accordingly, the package size field does not 
need to be considered. 

If at step 408 it is determined that the minimum threshold 
has been met, then at step 410 guesser module 72 determines 

2 5 whether there are any other fields which should be considered. 
If there is another field, then method 400 returns to step 402 
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where the next field is selected. Steps 402 through 410 are 
repeated for each relevant field until either the numerical value 
for a particular field does not meet the minimum threshold for 
that field or there is no other field to be considered. 

At step 412, guesser module 72 generates a confidence 
measure representing the similarity between the product under 
consideration and the known product. In one embodiment, the 
confidence measure can be calculated by weighting the assigned 
fraction for each field and aggregating the weighted values. As 
described herein, the confidence measure can be a normalized 
value (between 0% and 100%) that is monotonically related to 
similarity. Method 400 then ends. 



With the computer system and method described herein, the 
present invention standardizes the raw data generated by diverse 
data sources during the movement of products across various 
supply chains, for example, in the food service industry. In the 
standardized product data, like products are identified by the 
same identifier or description. With the standardized product 
data, participants in the food service industry, including 
manufacturers, distributors, and operators, are able to optimize 
efficiency in their operations, for example, in the areas of 
marketing, distribution, and purchasing. Accordingly, the 
present invention enables streamlining of the supply chains for 
products in the food service industry. 
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Although particular embodiments of the present invention 
have been shown and described, it will be obvious to those 
skilled in the art that changes and modifications may be made 
without departing from the present invention in its broader 
5 aspects, and therefore, the appended claims are to encompass 

within their scope all such changes and modifications that fall 
within the true scope of the present invention. 
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WHAT IS CLAIMED IS: 

1. A computer system for generating standardized product 
data, the computer system comprising: 

a database operable to maintain data for a plurality of 
5 known products, each known product associated with a respective 
standardized product code; and 

a processing facility coupled to the database, the 
processing facility operable to receive raw data for an 
unidentified product from a plurality of diverse data sources 
10 each of which has its own separate identifier for the 
unidentified product, to compare the raw data for the 
unidentified product against the data for the plurality of known 
products, and if there is a match between the raw data for the 
unidentified product and the data for one of the plurality of 
15 known products, to assign the respective standardized product 

code of the matching known product to the unidentified product. 

2. The computer system of Claim 1 wherein: 

the raw data comprises a raw description for the 
20 unidentified product; 

the data maintained in the database comprises a separate 
stored description for each of the plurality of known products; 
and 

the processing facility is operable to a compare the raw 
25 description for the unidentified product against the stored 
descriptions for each of the plurality of known products. 
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3. The computer system of Claim 1 wherein: 

the raw data comprises a number of field values for the 
unidentified product; 
5 the data maintained in the database comprises separate field 

values for each of the plurality of known products; and 

the processing facility is to compare a predetermined 
combination of the field values for the unidentified product 
against corresponding field values for each of the plurality of 
10 known products. 

4. The computer system of Claim 1 wherein the processing 
facility is operable to parse the raw data into a number of 
separate fields values for the unidentified product. 

15 

5. The computer system of Claim 1 wherein the processing 
facility is operable to generate at least one guess as to a known 
product which is a possible match for the unidentified product. 

20 6. The computer system of Claim 5 wherein the processing 

facility is operable to generate a confidence measure for the at 
least one guess. 

7. The computer system of Claim 1 further comprising an 
25 interface coupled to the processing facility, the interface 
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operable to present the assigned standardized product code to an 
analyst for auditing. 

8. A method performed on a computer system for generating 
5 standardized product data, the method comprising: 

maintaining data for a plurality of known products, each 
known product associated with a respective standardized product 
code; 

receiving raw data for an unidentified product from a 
0 plurality of diverse data sources, each data source having its 
own separate identifier for the unidentified product ; 

comparing the raw data for the unidentified product against 
the data for the plurality of known products; and 

if there is a match between the raw data for the 
5 unidentified product and the data for one of the plurality of 
known products, assigning the respective standardized product 
code of the matching known product to the unidentified product. 

9. The method of Claim 8 comprising presenting the 

0 assigned standardized product code to an analyst for auditing. 

10. The method of Claim 8 comprising parsing the raw data 
into a number of separate fields values for the unidentified 
product . 
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11. The method of Claim 8 wherein the raw data comprises a 
raw description for the unidentified product and the maintained 
data comprises a separate stored description for each of the 
plurality of known products, and wherein comparing comprises 

5 comparing the raw description for the unidentified product 
against the stored descriptions for each of the plurality of 
known products. 

12 . The method of Claim 8 wherein the raw data comprises a 
10 number of field values for the unidentified product and the 

maintained data comprises separate field values for each of the 
plurality of known products, and wherein comparing comprises 
comparing a predetermined combination of the field values for the 
unidentified product against corresponding field values for each 
15 of the plurality of known products. 

13. The method of Claim 8 comprising generating at least 
one guess as to a known product which is a possible match for the 
unidentified product. 

20 

14. The method of Claim 13 comprising presenting the at 
least one guess to an analyst for assigning a standardized 
product code to the unidentified product . 

25 15. The method of Claim 13 wherein the raw data comprises a 

number of field values for the unidentified product and the 
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maintained data comprises separate field values for each of the 
plurality of known products, and wherein generating comprises 
performing a pattern comparison of the field values for the 
unidentified product against the field values for each known 
5 product . 

16. A computer system for generating standardized product 
data, the computer system comprising: 

a database operable to maintain data for a plurality of 
VjlO known products, each known product associated with a respective 
-J standardized product code, the data maintained in the database 

iij comprising a separate stored description and set of field values 
for each of the plurality of known products; 

a processing facility coupled to the database and operable 

115 to: 

■■3 receive raw data for an unidentified product from a 

plurality of diverse data sources each of which has its own 
separate identifier for the unidentified product, the raw 
data comprising a raw description and set of field values 
20 for the unidentified product, 

compare the raw description for the unidentified 
product against the stored descriptions for each of the 
plurality of known products, 

if the raw description for the unidentified product 
2 5 does not match any of the stored descriptions for the 

plurality of known products, compare a predetermined 
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combination of the field values for the unidentified product 
against corresponding field values for each of the plurality 
of known products, and 

if the raw description for the unidentified product 
5 matches a stored description for one of the plurality of 

known products, or if all of the field values for the 
unidentified product match the corresponding field values 
for one of the plurality of known products for the 
predetermined combination, assign the respective 
10 standardized product code of the matching known product to 

the unidentified product. 



17. The computer system 
facility is operable to parse 
15 for the unidentified product. 



of Claim 16 wherein the processing 
the raw data into the fields values 



18. The computer system of Claim 16 comprising an interface 
coupled to the processing facility, the interface operable to 
present the assigned standardized product code to an analyst for 
2 0 auditing. 



19. The computer system of Claim 16 wherein the processing 

facility is operable to generate at least one guess as to a known 

product which is a possible match for the unidentified product. 

25 
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20. The computer system of Claim 19 wherein the processing 
facility is operable to generate a confidence measure for the at 
least one guess . 

21. The computer system of Claim 20 comprising an interface 
coupled to the processing facility, the interface operable to 
present the at least one guess and confidence measure to an 
analyst for assignment of a standardized product code. 

22. A method performed on a computer system for generating 
standardized product data, the method comprising: 

receiving raw data for an unidentified product from a 
plurality of diverse data sources each of which has its own 
separate identifier for the unidentified product, the raw data 
comprising a raw description and set of field values for the 
unidentified product; 

comparing the raw description for the unidentified product 
against the stored descriptions for each of the plurality of 
known products; 

if the raw description for the unidentified product does not 
match any stored description, comparing a predetermined 
combination of the field values for the unidentified product 
against corresponding field values for each of the plurality of 
known products; and 

if the raw description for the unidentified product matches 
a stored description for one of the plurality of known products. 
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or if all of the field values for the unidentified product match 
the corresponding field values for one of the plurality of known 
products for the predetermined combination, assigning the 
respective standardized product code of the matching known 
5 product to the unidentified product. 
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ABSTRACT OF THE DISCLOSURE 

A computer system is provided for generating standardized 
product data. The computer system includes a database which 
maintains data for a plurality of known products, each known 
product associated with a respective standardized product code. 
A processing facility, coupled to the database, receives raw data 
for an unidentified product from a plurality of diverse data 
sources, each of which has its own separate identifier for the 
unidentified product. The processing facility compares the raw 
data for the unidentified product against the data for the 
plurality of known products. If there is a match between the raw 
data for the unidentified product and the data for one of the 
plurality of known products, the processing facility assigns the 
respective standardized product code of the matching known 
product to the unidentified product. 
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DECLARATION FOR PATENT APPLICATION 
AND POWER OF ATTORNEY 

As a below named inventor, I hereby declare that: 

My residence, post office address and citizenship are as stated below adjacent to my name. 

I believe I am the original, first and sole inventor (if only one name is listed below) or an original, 
first and joint inventor (if plural names are listed below) of subject matter (process, machine, 
manufacture, or composition of matter, or an improvement thereof) which is claimed and for which a 
patent is sought by way of the application entitled 

System and Method For Product Data Standardization 

which (check) ^ is attached hereto. 

CH and is amended by the Preliminary Amendment attached hereto. 

□ was filed on as Application Serial No. 

C] and was amended on (if applicable). 

I hereby state that I have reviewed and understand the contents of the above identified specification, 
including the claims, as amended by any amendment referred to above. 

I acknowledge the duty to disclose information, which is material to patentability as defined in Title 
37, Code of Federal Regulations, § 1.56. 

I hereby claim foreign priority )3enefits under Title 35, United States Code, § 1 19(a)-(d) of any foreign 
application(s) for patent or inventor's certificate or any PCT international application(s) designating at 
least one country other than the United States of America listed below and have also identified below 
any foreign application(s) for patent or inventor's certificate or any PCT international application(s) 
designating at least one country other than the United States of America filed by me on the same 
subject matter having a filing date before that of the application(s) of which priority is claimed: 



Prior Foreign Application(s) 


Priority Claimed 


Number 


Country 


Day/MonthA^ear Filed 


Yes 


No 


N/Ainclude prior 
foreign application if 
applicable 






□ 


□ 



I hereby claim the benefit under Title 35, United States Code, § 119(e) of any United States 
provisional application(s) listed below: 



Provisional Application Number 


Filing Date 


N/A 

include provisional application if applicable 
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I hereby claim the benefit under Title 35, United States Code, § 120 of any United States 
application(s) or PCT international application(s) designating the United States of America listed 
below and, insofar as the subject matter of each of the claims of this application is not disclosed in the 
prior application(s) in the manner provided by the first paragraph of Title 35, United States Code, § 
112, I acknowledge the duty to disclose information, which is material to patentability as defined' in 
Title 37, Code of Federal Regulations, § 1.56, which became available between the filing date of the 
prior application(s) and the national or PCT international filing date of this application: 



Application Serial No. 


Filing Date 


Status (patented, pending, abandoned) 


N/A 







I hereby appoint the following attomey(s) and/or agent(s) to prosecute this application and to transact 
all business in the United States Patent and Trademark Office connected therewith: 



Alan H. MacPherson (24,423); Brian D. Ogonowsky (31,988); David W. Heid (25,875); Norman R. 
Klivans (33,003); Edward C. Kwok (33,938); David E. Steuber (25,557); Michael Shenker (34,250); 
Stephen A. Terrile (32,946); Peter H. Kang (40,350); Ronald J. Meetin (29,089); Ken John Koestner 
(33,004); Omkar K. Suryadevara (36,320); David T. Millers (37,396); Michael P. Adams (34,763); 
Robert B. Morrill (43,817); Michael J. Halbert (40,633); Gary J. Edwards (41,008); James E. Parsons 
(34,691); Daniel P. Stewart (41,332); Philip W. Woo (39,880); John T. Winbum (26,822); Tom Chen 
(42,406); Fabio E. Marino (43,339); William W. Holloway (26,182); Don C. Lawrence (31,975); 
Marc R. Ascolese (42,268); Carmen C. Cook (42,433); David G. Dolezal (41,71 1); Roberta P. Saxon 
(43,087); Mary Jo Bertani (42,321); Dale R. Cook (42,434); Sam G. Campbell (42,381); Matthew J. 
Brigham (44,047); Hugh H. Matsubayashi (43,779); Patrick D. Benedicto (40,909); T.J. Singh 
(39,535); Shireen Irani Bacon (40,494); Rory G. Bens (44,028); George Wolken, Jr. (30,441); John A. 
Odozynski (28,769); Cameron K. Kerrigan (44,826); Paul E. Lewkowicz (44,870); Theodore P. 
Lopez (44,881); Mayankkumar M. Dixit (44,064); Eric Stephenson (38,321); Christopher AUenby 
(45,906); David C. Hsia (46,235) and Mark J. Rozman (42,1 17). 

Please address all correspondence and telephone calls to: 

Philip W. Woo 
Attorney for Applicants 
SKJERVEN, MORRILL, MacPHERSON, FRANKLIN & FRIEL LLP 

25 Metro Drive, Suite 700 
San Jose, California 951 10-1349 

Telephone: 415 217-6000 
Facsimile: 415 434-0646 

I declare that all statements made herein of my own knowledge are true, all statements made herein on 
information and belief are believed to be true, and all statements made herein are made with the 
knowledge that whoever, in any matter within the jurisdiction of the Patent and Trademark Office, 
knowingly and willfiilly falsifies, conceals, or covers up by any trick, scheme, or device a material 
fact, or makes any false, fictitious or fraudulent statements or representations, or makes or uses any 
false writing or document knowing the same to contain any false, fictitious or fraudulent statement or 
entry, shall be subject to the penalties including fine or imprisonment or both as set forth under 18 
U.S.C. 1001, and that violations of this paragraph may jeopardize the validity of the application or 
this document, or the validity or enforceability of any patent, trademark registration, or certificate 
resulting therefrom. 
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Attorney Docket No.: M-8603 US 
Full name of first inventor: ^ j Dejan IsWNenov 



Inventor's Signature: ^y^-^Wrr\JlA^->^>^ Date: ^Ap /^aiKD 

Residence: San Francisc\^, California f < 

Post Office Address: 100 First Street, Ste 100-353 Citizenship: USA 

San Francisco, CA 94105 

Full name of second inventor: Yongwon (nmi) Lee 



Inventor's Signature: -/^^^^ri^-^ ^ Date: tr//^/:^i^ 

Residence: San Jose, California 

Post Office Address: 7172 Rainbow Drive Citizenship: Republic of South 
San Jose, CA 95129 Korea 



Full name of third inventor: / Todd J. Gettelfmger 




Inventor's Signature: 

Residence: Chicago, Illinois ^ 

Post Office Address: 1735 North Mohawk Street Citizenship: USA 

Chicago, IL 60614 

Full name of fourth inventor: Shermann L. Min 



Inventor's Signature: Q^^^ Date: 5"/ | S j-2 

Residence: '^acifica, California 



Post Office Address: 77 Paloma Avenue #203 Citizenship: USA 

Pacifica, CA 94044 



Full name of fifth inventor: Dawn T. Ohlendorf 



Inventor's Signature: pate; ^ I 1^0^ 

Residence: Redwood City, Califc^ftiia ' ' 

Post Office Address: 3DelmarCourt Citizenship: USA 
Redwood City, CA 94603 
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Full name of sixth inventor: Donna B. Tobias 

Inventor's Signature: /^3^--^r " Date: "5 • ^ ' 2£SO 

Residence: Pcirtola Valley, California 

Post Office Address: 4131 Alpine Road Citizenship: USA 

Portola Valley, CA 94028 
Full name of seventh inventor: xJohn R. Gilmer, II 



Inventor's Signature ^ \^ ^HA^X/ ^ Date: 
Residence: SaiiAnselmo, California 

Post Office Address: lo Greensburgh Lane Citizenship: USA 

San Anselmo, CA 94960 
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